-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: EventSource and Sensor HA without extra RBAC #1163
Conversation
@@ -332,26 +335,6 @@ func buildDeploymentSpec(args *AdaptorArgs) (*appv1.DeploymentSpec, error) { | |||
spec.Template.Spec.PriorityClassName = args.EventSource.Spec.Template.PriorityClassName | |||
spec.Template.Spec.Priority = args.EventSource.Spec.Template.Priority | |||
} | |||
allEventTypes := eventsources.GetEventingServers(args.EventSource, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this any more, with leader election, all the event source deployments can run with rolling update
strategy.
@@ -270,10 +275,6 @@ func buildDeploymentSpec(args *AdaptorArgs) (*appv1.DeploymentSpec, error) { | |||
MatchLabels: args.Labels, | |||
}, | |||
Replicas: &replicas, | |||
Strategy: appv1.DeploymentStrategy{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With leader election, sensor deployments can also run with rolling update
strategy.
@@ -284,54 +280,24 @@ func (e *EventSourceAdaptor) Start(ctx context.Context) error { | |||
// EventSource object use the same type of deployment strategy | |||
break | |||
} | |||
if !isRecreatType || e.eventSource.Spec.GetReplicas() == 1 { | |||
if !isRecreatType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Event sources like webhook
should not run with leader election.
return PodsLogContains(ctx, kubeClient, namespace, regex, podList, timeout), nil | ||
} | ||
|
||
func PodsLogContains(ctx context.Context, kubeClient kubernetes.Interface, namespace, regex string, podList *corev1.PodList, timeout time.Duration) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch logs from multiple Pods.
Signed-off-by: Derek Wang <[email protected]>
I'm in two minds about this. Leadership-election is a core cloud-native feature that is well understood by the Argo Team and community in general. We're replacing it with something that makes us more reliant on NATS. Given that a user needs to make changes to use the feature either way, would it be better to make using leadership elections straight forward? |
If we don't have NATS existing in the architecture, or the HA is for controllers, there's no doubt k8s leader election is the first choice. However the HA is for a dynamic service (EventSource or Sensor), and it so happened NATS is already there, I think we should utilize it, because that benefits much - it saves lots of extra configuration. This approach requires the minimal spec change - only needs to specify
And these extra changes are required at least per namespace, if the user has a strong sense of security, he probably will specify I understand the concern of reliability of using leader election implemented over NATS, even I have done all kinds of testing upon it without seeing any issue, I still can not say it's reliable enough. How about this, we can defer using leader election when |
I think we should go with NATS now. It's a big ask on our users to add a lot of extra RBAC, which people struggle with. |
select { | ||
case <-ctx.Done(): | ||
log.Info("exiting...") | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally defer a cancel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When current node is changed from leader
to non-leader (line 128-133), cancel()
need to be called to terminate the running service, and re-initiate a cctx and cancel. Not quite sure if defer cancel()
still works in that case, let me do more testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using defer works, updated.
controllers/sensor/resource.go
Outdated
@@ -185,9 +185,14 @@ func buildDeployment(args *AdaptorArgs, eventBus *eventbusv1alpha1.EventBus) (*a | |||
}, | |||
}, | |||
}) | |||
emptyDirVolName := "tmp-volume" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor - maybe just call this tmp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Signed-off-by: Derek Wang <[email protected]>
Signed-off-by: Derek Wang <[email protected]>
* feat: EventSource and Sensor HA without extra RBAC Signed-off-by: Derek Wang <[email protected]>
* feat: EventSource and Sensor HA without extra RBAC Signed-off-by: Derek Wang <[email protected]>
Signed-off-by: Derek Wang [email protected]
A new approach to do
Active-Passive
HA for Sensors and some of the EventSources, which does not require configuring extra RBAC. It utilizesraft
leader election algorithm implemented over NATS.See https://github.com/nats-io/graft.
Checklist: